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ABSTRACT 

We present SymPix, a special-purpose spherical grid optimized for efficient sampling of rotationally 
invariant linear operators. This grid is conceptually similar to the Gauss-Legendre (CL) grid, aligning 
sample points with iso-latitude rings located on Legendre polynomial zeros. Unlike the GL grid, 
however, the number of grid points per ring varies as a function of latitude, avoiding expensive 
over-sampling near the poles and ensuring nearly equal sky area per grid point. The ratio between 
the number of grid points in two neighbouring rings is required to be a low-order rational number 
(3, 2, 1, 4/3, 5/4 or 6/5) to maintain a high degree of symmetries. Our main motivation for this 
grid is to solve linear systems using multi-grid methods, and to construct efficient preconditioners 
through pixel-space sampling of the linear operator in question. The GL grid is not suitable for 
these purposes due to its massive over-sampling near the poles, leading to nearly degenerate linear 
systems, while HEALPix, another commonly used spherical grid, exhibits few symmetries, and is 
therefore computationally inefficient for these purposes. As a benchmark and representative example, 
we compute a preconditioner for a linear system with both HEALPix and SymPix that involves the 
operator D -|- where B and D may be described as both local and rotationally invariant 

operators, and N is diagonal in pixel domain. For a bandwidth limit of £max = 3000, we find that 
SymPix, due to its higher number of internal symmetries, yields average speed-ups of 360 and 23 for 
B^N“^B and D, respectively, relative to HEALPix. 

Subject headings: Methods: numerical — methods: statistical — cosmic microwave background 


1. INTRODUCTION 

Unlike the plane, it is impossible to construct a regu¬ 
lar discretization of the sphere. Instead, every conceiv¬ 
able spherical grid comes with its own set of trade-offs, 
emphasizing one or more features at the cost of others. 
Thus, there is no such thing as a perfect spherical grid, 
but the optimal grid instead depends sensitively on the 
application under consideration. 

In this paper, we will restrict our attention to high- 
resolution grids designed for fast and accurate spherical 
harmonic transforms (SHTs). In such cases, the primary 
consideration is that the grid must allow for efficient 
^(^max) SHTs, where .^max denotes the upper harmonic 
space bandwidth limit of the field in question, as opposed 
to the scaling resulting from naive brute-force 

summation. This requires the use of Fast Fourier Trans¬ 
forms (FFTs) in the longitudinal direction, which in turn 
implies that i) sample points must be placed on a set of 
iso-latitude rings, and ii) sample points within each ring 
must be equidistant. However, there is still flexibility in 
choosing the latitude of each ring {9j G [0,7r]), the num¬ 
ber of grid points along each ring (nj), and the initial 
offset of each ring {(joj). 

Three popular spherical gri ds are the equiangular grid , 
the Gauss-Legend re grid (e.g., Doroshkevich et al.|2005 ), 
and HEALPisp (Gorski et al. 2005). Of these, the 
equiangular grid is the most straightforward, simply de¬ 
fined by evenly spaced grid points {0i, (pi) in both direc¬ 
tions. This grid is typically used for geographical maps. 
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and it is therefore also called a geographical grid. 

Similarly, the standard Gauss-Legendre grid has a con¬ 
stant number of grid points per ring. However, the ring 
latitudes 9j are defined such that PN,i„g„{cos9j) = 0, 
where P„ is the Legendre polynomial of degree n. This 
simple modification allows efficient spherical harmonic 
analysis to machine precision, and the grid is thus opti¬ 
mized for spherical harmonics transforms. 

Both of these grids suffers from a massive over- 
sampling of the polar regions {9 close to 0 or tt) com¬ 
pared to the equatorial region (6* « 7r/2), and this renders 
them sub-optimal, and sometimes even useless, for cer¬ 
tain practical applications. An important example is the 
solution of discretized and bandwidth limited linear sys¬ 
tems. If there is a large number of sample points within 
the correlation length implied by fmax, the system be¬ 
comes degenerate and numerically unstable. Grids with 
nearly constant pixel areas perform much better than 
grids with strongly varying pixel areas for this type of 
applications. 

One example of such grids is HEALPix, which is short 
for “Hierarchical Equal Area and Latitude Pixelization”. 
This grid has by construction both constant area pixel 
area per pixel and grid points located on iso-latitude, 
and is as such a good general-purpose grid. However, 
this generality comes at a cost in terms of spherical har¬ 
monics precision, as well as a low level of internal pixel 
symmetries. 

The latter point is particularly important for our ap¬ 
plications. Consider a function of two grid points, hi and 
h 2 , that is both localized and rotationally invariant, 


fini,h2) 


/(hi • 712 ) if arccos(hi • 112 ) < kA 
0 otherwise. 


( 1 ) 
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where A denotes the average distance between two neigh¬ 
bouring grid points. Thus, / is assumed identically zero 
if the two grid points are separated by more than k 
grid units. In our applications, which employ multi-grid 
and/or preconditioning methods, we need to evaluate / 
for all relevant pairs (ni,n 2 ). Furthermore, because / 
typically is computationally expensive, it is important to 
minimize the total number of function evaluations, and 
large speed-ups can be gained by exploiting symmetries 
and caching. 

For HEALPix, / needs to be evaluated O(fc^fVpix) 
times, because the angular distances between neighbour¬ 
ing grid points are all different, up to handful of overall 
symmetries. In contrast, for the equi-angular and Gauss- 
Legendre grids only 0{k'^ ^/N^) evaluations are needed. 
Since the number of grid points is constant for every ring, 
we only need to evaluate / for the first grid point on ev¬ 
ery ring, accounting for all its neighbours, after which all 
function evaluations along the same ring will be given by 
symmetry. 

In this paper, we construct a novel spherical grid called 
SymPix that combines the spherical harmonics transform 
precision of the Gauss-Legendre grid with the nearly uni¬ 
form sample point distances of HEALPix, while at the 
same time maintaining a high degree symmetries within 
each ring, ensuring that fully sampling fini ■ 112 ) scales 

as 0(fc2yiVpA). 


exploiting that contributions from higher-ordered har¬ 
monics are numerically irrelevant. An explicit bound 
on the numbe r of pixels required for ma chine precision 


was derived b y Prez eau & Reinecke (20101, and Reinecke 
'& Seljebotn ( 2013|) used this to construct the reduced 


Gauss-Legendre grid. Explicitly, for a given ring located 
at some latitude 0, Equation defines the maximum in 
such that Yimid, </>) does not vanish. The minimum num¬ 
ber of pixels on that ring is then given by 2 m 1, result¬ 
ing in a longitudinal sample frequency that exceeds the 
Nyquist frequency. 


2.2. Tiling 

As discussed in Section |T] our primary usecase is eval¬ 
uating a function /(hi, 712 ) for all possible pairs (hi, 772 ), 
but with the restriction that / is zero unless hi and 
712 are close together. To avoid unnecessary searches 
over vanishing pairs, we therefore partition our grid into 
a set of /c X fc-sized tiles, where k is chosen such that 
/(hi,h 2 ) = 0 unless hi and h 2 are either in the same 
tile or in two neighbouring tiles. Thus, finding all rele¬ 
vant partner points for a given grid point simply amounts 
to a closest neighbour tile look-up. However, this also 
requires that the number of rings is divisible by k (let¬ 
ting A/ings > ^max + 1 if necessary), and that a set of k 
consequtive rings must have the same number of sample 
points. We will refer to each such set of k rings as a band. 


2. THE SYMPIX GRID 
2.1. Ring layout basics 

The main role of the SymPix grid is that of a support¬ 
ing grid in internal multi-grid and/or preconditioning cal¬ 
culations, and maintaining high numerical precision is 
therefore essential. For this reason, we adopt the Gauss- 
Legendre latitudinal ring layout as the basis of our grid. 
This provides support for both spherical harmonic syn¬ 
thesis (i.e., transforming from harmonic coefficients to 
pixel space) and analysis (transforming from pixel space 
to harmonic coefficients) to machine precision, by virtue 
of having an exact quadrature rule on the form 

aem= [ Y*{h)f{h)dVL = ^Y*{hi)f{fii)wi, ( 2 ) 

Jo. 


where Wi is a set of quadrature weights. By placing 
rings exclusively on the zeros of the I'max’th polynomial, 
one is guaranteed that P£_^^^+i(cos 0i) = 0, and the dis¬ 
cretized held is algebraically bandwidth limited to har¬ 
monic modes with £ < £niax- 
Next, we need to include enough sample points along 
each ring to fully resolve all spherical harmonic modes 
with £ < £inax- Formally speaking, this requires 2A/ings 
grid points per ring. However, this requirement is 
somewhat counter-intuitive by suggesting massive over- 
sampling of the polar regions compared to the equatorial 
region. And, indeed, our intuition is correct: The spher¬ 
ical harmonic modes Yi^iO^cj)) are very close to zero in 
the polar regions for high £ and tti, and these are the only 
modes that can cause high-frequency variation in the lon¬ 
gitudinal direction. For this reason, th e libsharp SHT 
package (Reinecke & Seljebotn 20131 omits I/in(^,</) 
whenever 


\/— 2m cos 0 — £niax sin 9 > max(100, 0.01£niax), (3) 


2.3. Enforcing symmetries 

The main remaining step is to define the number of 
tiles per band. On the one hand, it must satisfy the 
minimum number of pixels given by Equation On the 
other hand, it may be beneficial to increase it beyond 
this, in order to increase symmetries within and across 
bands. For instance, if we sample / from Equation for 
all point-pairs within a tile, the result can obviously be 
re-used for all tiles in that band, since all between-point 
angular distances are conserved between tiles. Similarly, 
we can reuse results between neighbouring tiles within 
the same band due to longitudinal symmetry. 

In addition, we exploit the additional degrees of free¬ 
dom in choosing the number of tiles to ensure symmetries 
with respect to latitudinally neighbouring tiles. Specif¬ 
ically, we require that the number of tiles can increase 
from one band to the next only by a factor of exactly 3, 
2, I, 4/3, 5/4, or 6/5. Additionally, at least two bands in 
a row must have the same number of tiles, except for the 
polar bands. Finally, in order to avoid special cases we 
allow no equatorial ring (i.e., we insist that A/jngs is an 
even number), and, purely conventionally, the location of 
the first grid point in a given ring is chosen to be half the 
pixel distance within that same ring. Together, these re¬ 
quirements ensure that the pattern of neighbouring tiles 
repeats itself with a short period, and the total number 
of different cases to evaluate scales as ©(Ai-ing) rather 
than O(Apix). We employ a dynamic programming al¬ 
gorithm to find the optimal number of tiles per band, 
subject to t he constraints defined above, as detailed in 
Section |2.5[ An example grid corresponding to fc = 2 
tiling is illustrated in Figure 

2.4. Memory layout and pixel ordering 
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Figure 1. Geometric layout of SymPix sample points, implementing a cylindrical projection of the sphere. Each rectangle indicates a tile 
of (in this case) 2x2 sample points. For white tile-bands, the bands above and below have the same number of tiles, and angular distances 
between sample points in a given tile and sample points in the neighbouring tiles are therefore constant throughout the band. Function 
evaluations depending only on angular distances may therefore be cached and reused. Colored tile-bands increment the number of tiles 
by a factor of 2 (red), 4/3 (blue), 5/4 (yellow), 6/5 (green), and 4/3 again (blue) towards the equator. For these bands, the neighbouring 
tile relationship repeats itself (as indicated by shading), and there are still only a few cases that need to be computed and cached for each 
band. 
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Figure 2. Memory ordering of SymPix sample points. Note that the resolution is lower than in Figure]^ Within each band the pixel 
order increases first latitudinally, i.e., along the 9 direction. This ensures that access within the same tile is local in memory, and there are 
no discontinuities along each ring, which is convenient for SHTs. Additionally, to support efficient distributed programming, we interleave 
Northern and Southern bands, such that they naturally are assigned to the same node without explicit additional book-keeping. 
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Optimal-SymPix-Grid: 


Inputs: 

•^max 

Band-limit of field to represent 

k - 

Tile size 

Output: 

n - 

number of bands 


number of tiles in each band 

Auxiliary: 


minimum number of tiles for band i 

Ci,t - 

the cost of the best partial solution for 
bands 0 to i when assuming Ti = t 

Pi,t - 

“previous-pointers”; when assuming Ti - 
the solution for bands 0 to z has Tj—i = 


Treat unassigned Ci^t as oo and unassigned Pi^Tq as —1 
-brings ^ ^max + 1 rounded Up to next multiple of 2k 

n i -brings/2A; 

Find 9j for each ring j as for Gauss-Legendre grid 
T -4— max(100, ^max/100) 
for each i G {0,..., n — 1}: 

Find minimum m that satisfies Equation for 
Qj ■<— \{2m + l)//c] 

To •!- min({2*3-’5'' 12*3^5'= > ao,i & N, j & N ,k & N}) 

Ci,To ■*“ C^O — 

for each i from 1 to n — 1: 

for each tprev such that Ci-i^tprev ^ 

for each x G {3, 2,1, 4/3, 5/4, 6/5} such that xtprev £ A/”: 
t ^ 3?tprev 

if + («i - t)^ < Ci,t 

and Oi < t < doi 
and (Ti-i,tp„v = *prev or x = 1): 

Ci^t ■<— C + (ftj — 

Pi,t ^ iprev 
T„_i <- argmin((C„,t) 
for each i from n — 2 to 1: 

Ti <- Pi+i_Ti+i 


Figure 3. Dynamic programming algorithm for optimizing the 
SymPix grid layout. In summary, the algorithm considers all 
possible solutions, and employ look-up tables of partial solutions 
for bands 0 to i — 1 when considering band i. The condition 
i.tprev = tprev ensures that at least two bands in a row have 
the same number of tiles, except (possibly) for the first two rows, 
Ti + To. 


While the above constraints fully define the geometric 
properties of the SymPix grid, they do not imply a canon¬ 
ical memory layout or “pixel ordering”. To fix this, we 
adopt two additional rules, both designed to maximize 
memory access efficiency and programming convenience. 

First, the Northern and Southern hemispheres are 
band-wise interleaved. That is, we first list the Northern¬ 
most polar band, followed by the Southern-most polar 
band, followed by the second Northern band and so on. 
The main advantage of this organization lies in conve¬ 
nient distributed programming across multiple comput¬ 
ing nodes; interleaving the two hemispheres ensures that 
the same node can readily exploit North-South symme¬ 
tries. 

Second, grid points are latitudinally major-ordered 
within a given tile, i.e., the pixel ordering increases most 
rapidly along the 0 direction. While the order within 
each tile could have been in any direction, this choice 
implies that pixel ordering is continuous across longitu¬ 
dinal tile borders, which is particularly convenient for 
SHTs. 

Figure provides an example of the resulting pixel 
ordering. Note that the resolution is lower than the cor¬ 
responding illustration in Figure [l] 


2.5. Grid optimization 

We end this section by describing the algorithm used 
to optimize the number of of tiles in each band, subject 
to the constraints defined in Section [2731 We will in the 
following only discuss the Northern hemisphere, as the 
Southern hemisphere is given directly by symmetry. 

To initialize the algorithm, the user must provide a tile 
size k and a total number of rings fVrings, where Nj-ings 
must divisible by both 2 and k. The grid will be able 
to accurately represent fields that are band-limited at 
f’max = -brings —1- Together, these parameters specify the 
angular resolution of the grid, and correspond in princi¬ 
ple to the HEALPix Aside parameter. We then number 
the bands by * = 0,... Abands — 1 = Agings/ {2k) — 1, such 
that each band consists of k rings. We also define to 
be the minimum number of tiles in each band subject to 
the constraint that the southmost ring within the band 
fulfills Equation!^ 

Deriving the optimal SymPix grid is now equivalent 
to determining the number of tiles, T^, for each band. 
For this optimization process we adopt the following cost 
function. 


c(To, ■ • • ,r/Vb,„d=-i) = 'Y^G{Ti) = - ai)^, ( 4 ) 

i i 

which must be minimized subject to 


5 1 123 
T, ^l5’4’3’ ’ ’ 


( 5 ) 


Additionally, we initialize the recursion by defining Tg as 
the smallest number larger than ag that is only a product 
of the factors 2, 3 and 5, and for computational speed we 
add the heuristic (or modification to the cost function) 
that Ti < 3ai, i.e., that no band should be over-pixelized 
by more than three times the Nyquist frequency. 

The actual calculation is then a simple exercise in dy¬ 
namic programmin g, as described in a ny standard text 
on algorithms (e.g. Cormen et al.J1989 ). Our implemen¬ 
tation is summarized m I'lgure 1^ which has a worst- 
case computational complexity of O(nan) = = 

C’(Apix), and the same worst-case memory use. Due to 
the low computational complexity and the fact that the 
optimization only needs to be performed once per grid 
resolution, we do not present benchmarks this operation; 
its computational cost is negligibly small for our pur¬ 
poses. 


3. BENCHMARKS AND COMPARISONS 

Before considering specific applications, we first char¬ 
acterize the basic performance of the SymPix grid in 
terms of computational efficiency and numerical accu¬ 
racy. 


3.1. Geometric efficiency 

We start by quantifying the geometric efficiency of our 
grid, as characterized by the overall number of grid points 
and the pixel area uniformity. For these tests, we con¬ 
sider an example grid with ^max = 2000 and fc = 4, 
sufficient to discretize a spherical field with an angular 
resolution of 15’ FWHM. Running the algorithm sum¬ 
marized in Figure with these input parameters yields 
a SymPix grid witn 5.6 • 10® grid points. 
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Figure 4. Number of SymPix grid points per ring as a function of 
latitude (solid line). The dotted line shows g, , i.e., the same quan¬ 
tity f or the reduced Gauss-Legendre grid (|Reinecke & Seljebotn| 

[2OI3I 1. - 



e 


Figure 5. SymPix pixel area as a function of latitude in units of 
47r/Npix (solid line). For the HEALPix grid, pixel areas are per¬ 
fectly uniform (dotted line), while significant over-sampling occurs 
close to the poles for the SymPix grid. 


In Figure we compare the number of SymPix grid 
points per ring with the optimal number of p oints per 


ring used by the re duced Gauss-Legendre grid (Reinecke 


& Seljebotn 2013). The ratio between the solid and 


dashed lines thus indicates the amount of longitudinal 
over-sampling implied by the SymPix grid. Except very 
close to the poles, where there are very few points in 
terms of absolute numbers, this ratio is never larger than 


1.35. 

A similar illustration is provided in Figure where 
we plot the pixel area as a function of latitude, dehning 
pixel borders strictly along longitudes and latitudes. The 
pixel area is given in units of the pixel area averaged over 
the full sky, i.e., 47r/7Vpix, such that a perfectly uniform 
pixelization, like HEALPix, corresponds to a constant 
value of unity. Overall, we see that the effective pixel 
areas vary at most by 20 % relative to the average, except 
near the poles, where the normalized area may be as low 


as 0.1. 


Figure shows a histogram of normalized pixel areas, 
and we see that the vast majority of grid points have a 
normalized area between 0.9 and 1.1. The tail below 0.8 
corresponds to the over-pixelized polar caps, and these 
contain only 0.4% of the total number of grid points for 
this particular example. Overall, the SymPix grid im¬ 
plies an over-sampling of about 11% compared to the re¬ 
duced Gauss-Legendre grid, which is acceptable for our 
purposes. 


3.2. Accuracy of spherical harmonic quadrature 

Next, we compare the numerical accuracy of spherical 
harmonics transforms as implemented on the SymPix, 
HEALPix and rednced Gauss-Legendre grids. This test 
is carried out through the following experiment: 

1. We draw a fiducial signal a = {a^m} in spherical 
harmonic domain, band-limited by some £max- All 
spherical harmonics coefficients are drawn from the 
same zero mean and unit variance Gaussian distri¬ 
bution, such that no angular scales dominate the 
real-space field. 

2. We project this signal onto the respective grid sam¬ 
ple points by spherical harmonic synthesis. 



Figure 6. Histogram of normalized SymPix pixel areas. The 
tail extending below 0.8 corresponds to polar oversampling, and 
contains about 0.4% of the total number of pixels for this particular 
grid setup. 

3. We convert the real-space signal back to harmonic 
space through spherical harmonic analysis, includ¬ 
ing multipoles up to fmax, to recover a. 

4. We repeat this procedure Nsim times, and summa¬ 
rize the results in terms of the resulting round-trip 

(i) _-^(i) (i) 

errors, 

Before presenting the results, we note that no fun¬ 
damental band-limit and/or resolution parameter Ngide 
exist for HEALPix for a given angular resolution. For 
instance, changing the band-limit £„iax will add/reduce 
aliasing for all scales. A quantitative head-to-head com¬ 
parison at a given resolution is therefore difficult, as ad¬ 
ditional parameter tuning can affect the results. With 
this caveat in mind, we present in Table results for 
three different band-limits, £max = {2.0,2.5, S.OjVgide 
with Aside = 256, quoting both the maximum and mean 

(i) 

errors as evaluated over all error coefficients e\f^. Each 
case includes = 100 simulations, and the SymPix 
tile size is fixed at A: = 8. 

Starting with the highest bandwidth case, £max = 
3 Aside, we first note that the regular Gauss-Legendre 
grid is the only one grid that achieves overall machine 
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Table 1 

Comparison of different grids in terms of number of pixels and accuracy of spherical harmonic analysis 


^max 

Grid 

Parameter 

Afpix 

A ■ /ArHEALPix 

Max. error 

Mean error 

CPU time for SHT (ms) 

511 

HEALPix 

A^side = 256 

786 432 

1.00 

2.1 ■ 10-2 

2.9 ■ 10-5 

160 


SymPix 

«max = 511 

390 656 

0.50 

7.8 ■ 10-3 

8.1 ■ 10-'^ 

67 


Gauss-Legendre 

^max = 511 

524 288 

0.67 

CO 

1 

o 

1 

o 

00 

66 

639 

HEALPix 

A^side = 256 

786 432 

1.00 

1 

o 

1.3 ■ 10-3 

219 


SymPix 

^max = 639 

591232 

0.75 

CO 

1 

O 

1.1 ■ 10-® 

118 


Gauss-Legendre 

^max = 639 

819 200 

1.04 

1.2 ■ 10-32 

3.2 ■ 10-34 

118 

767 

HEALPix 

A^side = 256 

786 432 

1.00 

1.6 ■ 10“ 

6.8 ■ 10-2 

287 


SymPix 

^max = 767 

838 656 

1.07 

4.0 ■ 10-2 

4.8 ■ 10-® 

188 


Gauss-Legendre 

^max = 767 

1179 648 

1.50 

1.0 ■ 10-32 

3.8 ■ 10-34 

188 


Note. — The HEALPix resolution is kept constant at A^side = 256, while the spherical harmonic bandlimit varies over €max = 
{2.0, 2.5, 3.0} Aside- The SymPix and Gauss-Legendre band-limits are identical to the spherical harmonic band-limit. 



Figure 7. Spherical harmonic round-trip error as a function of multipole, summarized in terms of maximum (dotted lines) and mean 
(solid lines) errors, averaged over both harmonic quantum number m and Nsim = 100 simulations. Black lines show results for a SymPix 
grid with ^max = 735 and tile-size 8; red lines show results for a HEALPix grid with Aside = 256 and ^max = 735; and blue lines show 
results for a regular Gauss-Legendre grid with ^max = 628. All grids have roughly the same number of grid points, Apix ~ 780 000. 
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Figure 8. Error induced by under-sampling (aliasing) as a function of multipole in terms of average errors, averaged over both harmonic 
quantum number m and = 100 simulations. The experimental setup is the same as in Figure[d but the spherical harmonic bandwidth 

limit varies between £max = 512 (solid), £max = 735 (dashed), and £max = 900 (dotted). 


precision, with a mean error of and a maxi¬ 

mum error of 0(10“^^). For comparison, the correspond¬ 
ing mean and maximum SymPix errors are 0(10“°) and 
respectively, while HEALPix achieves 0(10“^) 
and 0(1) for this high bandwidth case. Reducing the 
bandlimit to .^max = improves the latter by about 

two orders of magnitude. 

However, the statistics listed in Table H provide only a 
very coarse comparison, because the roum-trip errors are 
highly scale dependent. In FigureJ^we therefore plot the 
error as a function of multipole, ^choosing the SymPix 


and HEALPix bandlimits such that the corresponding 
grids roughly match a HEALPix Ngide = 256 grid in 
terms of total number of sample points. For SymPix, this 
corresponds to .^max = 735, and for the Gauss-Legendre 
grid it is fmax = 628. 

Starting with the Gauss-Legendre grid (blue lines), we 
see that the error reaches machine precision up to the 
bandwidth limit; at higher multipoles no information is 
carried by the grid. In contrast, the SymPix grid reaches 
machine precision up to £ « O.S^max, while the error in¬ 
creases more smoothly at higher multipoles. However, 
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even though the high-£ error increase is smooth, it is 
still exponential, and the mean and maximum statistics 
listed in Table are therefore strongly dominated by the 
small-scale errors. Thus, by virtue of deriving its main 
geometric grid layout from the Gauss-Legendre grid, we 
see that the numerical performance of the SymPix grid is 
excellent on large and intermediate angular scales, and 
the cost of its superior symmetry properties primarily 
comes in the form of sub-optimal small-scale residuals. 
For comparison, the HEALPix errors are roughly con¬ 
stant at 0(10“^) to 0(10“^), and vary only weakly with 
angular scale. Note that in all cases the errors can be 
reduced by iteration techniques, essentially using least 
squares minimization to find the spherical harmonic sig¬ 
nal with least power that projects exactly to the map, 
and employing the result of spherical harmonic analysis 
as a preconditioner. 

The large errors seen for the Gauss-Legendre grid 
above ima.x is due to under-sampling or, equivalently, 
aliasing. In Figure we study this effect directly by 
varying the spherical harmonics bandwidth limit between 
^max = 512, 735 and 900; note, however, that the actual 
grid resolution parameters are kept fixed at the above val¬ 
ues, and the higher resolutions enforced here therefore no 
longer match the respective grid properties. Considering 
first the Gauss-Legendre grid with a SHT bandlimit of 
^max = 512, we see, as expected, that the errors reach 
machine precision at all scales. However, for the higher 
bandlimits, ^max = 735 and 900, both of which are higher 
than the grid resolution of £1^^^ = 528, the errors saturate 
at a multipole below the grid resolution. To be specific, 
the critical multipole is 2£|^^x ~ -^m^x) corresponding to 
the well-known aliasing limit from standard Fourier the¬ 
ory. However, at lower multipoles no aliasing is observed 
for the Gauss-Legendre grid, which implies that it is fully 
robust with respect to under-sampling, given a known 
bandlimit. 

In comparison, the corresponding HEALPix errors are 
non-local, in the sense that increasing the spherical har¬ 
monics bandlimit increases the errors at all angular 
scales: The dotted line (£max = 900) lies consistently 
higher than the dashed line (£max = 735), which in turn 
lies consistently higher than the solid line (£max = 512). 
The HEALPix grid is thus not robust against under¬ 
sampling, and it is very important to choose a grid reso¬ 
lution appropriate for the bandwidth of the signal under 
consideration, which in several applications may imply 
over-sampling the signal. 

The SymPix grid performance lies, as expected, be¬ 
tween those of Gauss-Legendre and HEALPix. On 
large angular scales, it achieves numerical precision, 
while on small scales the aliasing increases exponentially 
with multipole, and eventually reaches similar levels as 
HEALPix. 

3.3. Computational speed of SHTs 

Before ending this section, we compare the perfor¬ 
mance of the SymPix, HEALPix and Gauss-Legendre 
grids in terms of computational speed. The rightmost 
column in Table[T]lists the GPU time for each of the cases 
considered above in units of wall-clock milli-seconds, 
while Figure [^presents a head-to-head comparison of the 
SymPix and HEALPix grid performance as a function of 
A^pix. All benchmarks were performed using libsharp 



Figure 9. Comparison between spherical harmonic transforms 
cost as performed with SymPix and HEALPix as a function of 
-^pix? plotted in terms of their ratio (black solid line). The dashed 
line shows the ratio between the number of grid point rings. 

on a single Intel Core i7 Q840 at 1.87 GHz (SSE2); for 
full details including CPU times in absolute numbers, 
we refer the interested reader to IReinecke & Seliebotnl 

([20T^. 

Overall, SymPix perform similarly to the Gauss- 
Legendre grid, and both execute about 30 % faster than 
HEALPix. This latter difference may be explained by the 
fact that the HEALPix grid points form a zig-zag pattern 
in which every other ring is longitudinally shifted by half 
a pixel width. This implies a grid point organization that 
comprise about 30 % more rings than Gauss-Legendre 
and SymPix grids, which exhibit more regular longitu¬ 
dinal pixel organizations. This is relevant, because the 
computational complexity of SHTs scales as 

CSHT = 0(lVring£Lx) + 0(A^pix log (6) 

'ring 

= l^('^max) + log£max)- 

The first term represents the cost of computing the as¬ 
sociated Legendre polynomials for each ring, and dom¬ 
inates the second term, which accounts for evaluating 
Fast Fourier Transforms (FFTs) along each ring. Thus, 
the number of grid points per ring is not critical for the 
overall speed of SHTs, while the total number of rings is. 

In addition, SymPix grids have by construction rings 
with pixel numbers that are only products of 2, 3 
and/or 5, which ensures efficient Fast Fourier Transforms 
(FFTs). In contrast, many HEALPix rings have pixel 
numbers that includes large primes, and therefore the 
Bluestein algorithm must be employed for these. This 
effect is more important for lower resolution grids, for 
which the cost of FFTs is relatively higher. 

4. APPLICATIONS 

We now turn our attention to practical applications, 
and in particular to the construction of efficient precon¬ 
ditioners. Before doing that, however, we consider a sim¬ 
pler application, namely real-space convolution, in order 
to build up intuition regarding the relevant operations. 
We emphasize that the purpose of this preliminary dis¬ 
cussion is not to provide a real-world alternative to spher- 









ical harmonic transform s, or the methods pres ented by 


Eisner & Wandelt (20111 and Sutter et al. (2012) for such 
convolutions, but simply to quantity the computational 
efficiency of the SymPix grid on a simple and intuitive 
application. 


4.1. Spherical convolution 

The convolution of a spherical image / with a kernel b 
is given by the spherical surface integral 

gin) = / &(h,TO)/(m)dflA. (7) 

J 477 

In our case we assume an azimuthally symmetric ker¬ 
nel, and bin, rh) therefore depends only on the distance 
between h and rh, such that 


gin) = / bin ■ rh)fim)dn^. (8) 

J 477 

This integral is most commonly performed in spher¬ 
ical harmonic domain, turning full-sky convolution 
into coefficient-wise multiplication with a correspond¬ 
ing transfer function, b^, which is given by the Legendre 
transform of 6(h-m). These computations are dominated 
by the spherical harmonic transforms, and therefore have 

a computational scaling of OiN^l^) = O(^max)- 
If b is spatially narrow compared to the required pix- 
elization, as is usually the case, one could instead con¬ 
sider the pixel-domain convolution by evaluating 

JVpix 

aini) = (9) 

i=i 

where the convolution kernel reads 


Kx) = 


e=o 


2 i+l 

iTT 


bgPeix). 


( 10 ) 


One would then make the approximation that bihi-hj) = 
0 whenever sample points i and j are more than k sample 
point distances apart, as discussed in Section 

For HEALPix, almost all sample point distances are 
different, and b must therefore be evaluated OiNpi^k"^) 
times. The computational complexity of pixel-domain 
convolution on the HEALPix grid therefore scales as 
0(-/Vpix ^max) = C>(fc^^max)) which is clearly inferior 
to the harmonic approach both in terms of speed and 
accuracy. With SymPix, however, the large number of 
symmetries allows us to reduce the computational com¬ 
plexity to ©(fc^iVpix -b v^iVpix £max) = C>(fc^^pix): One 
simply needs to choose a tile size k such that only sam¬ 
ple point pairs within a tile and between neighbouring 
tiles must be considered. Then for, each band of k rings, 
bin ■ rh) only needs to be evaluated for the first few tiles 
of the band, as other distances within the same band will 
be identical within the remainder of the band. 

The speed-up for evaluating all necessary 6(h-m), when 
approximating bin ■ rh) = 0 whenever h and m are not 
in neighbouring tiles, are given in Table In addition 

to scaling better than the OiN^l^ ) spherical harmonic 
transforms, this approach should also be easier to paral¬ 
lelize and implement efficiently on a GPU. 


Table 2 

CPU time and theoretical speed-up for evaluating b{n • m) 


^max 

CPU time 
[sec] 

Speed-Up 

[factor] 

3000 

9.8 

732 

1500 

3.6 

335 

750 

1.4 

149 

375 

0.74 

70 

188 

0.50 

26 

100 

0.31 

14 


Note. — We have approximated b{h • m) = 0 when¬ 
ever h and rh are not in neighbouring tiles. The third col¬ 
umn shows the number of non-zero b{h • rh), which scales 
as 0(/c^A^pix), divided by the number of elements we had 
to compute when making use of the SymPix symmetries, 
which scales as 0{k'^y/W^). In this example we have cho¬ 
sen k = S. 


Note that yet another method for spherical con¬ 
volution with a symmetric kernel has been imple¬ 


mented in the ARKCoS code (Eisner & Wandelt 2011 


Sutter et al. 2012), with a computational scaling o: 
C’(fcCaxlogimax) = 0(/c Vpix log Apix). Whether a 
SymPix-based convolution would improve relative to 
their work for relevant resolution parameters and accu¬ 
racy requirements remains to be explored. 


4.2. Preconditioner construction for linear systems 

Finally, we are in the position to discuss the applica¬ 
tion of the SymPix grid to our main usecase, namely 
for solving linear systems involving rotationally invari¬ 
ant operators in pixel domain, either through multi-grid 
methods or to construct efficient preconditioners. The 
simplest example of such a system is 

YBY^x = b, (11) 


where Y, as usual, is the matrix corresponding to spher¬ 
ical harmonic synthesis and B is a diagonal matrix in 
spherical harmonic domain, = biSi^iiSmm' ■ The 

product YBY^ is a pixel domain operator with strong 
spatial couplings within the correlation length implied by 
b. Of course, this particular system could have been triv¬ 
ially solved by converting to spherical harmonic domain, 
which would diagonalize the coefficient matrix. How¬ 
ever, if there are more terms in the operator, this is 
no longer possible, and iterative solvers like Conjugate 
Gradients or multi-level algorithms are needed. In these 
cases SymPix is useful to construct preconditioners or 
smoothers. 

Our own main interest lies in drawing constrained 
Gaussian re alizations of the CMB sky by using a multi¬ 
level solver (Seljebotn et al.|[2014). T his may performed 


by solving the following linear system ( Jewell et al.|2004 
Wandelt et al.|[2004 Elriksen et al.||2UUl| ( 


Yi(D -f BYobsN-W^bsB)Y( x = r. 


( 12 ) 


where D and B are diagonal matrices in spherical har¬ 
monic domain, characterized by transfer functions di and 
bi, N-i is a diagonal (inverse noise covariance) matrix 
in pixel domain, pixelized on some external grid 9i, and 
r is a stochastic term that depends on the data set in 
question. 

Two different spherical grids are involved in system. 
First, the outermost spherical harmonics transform, Yi, 
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denotes synthesis to a grid of our own choosing. We will 
use a SymPix grid of resolution £max for this operator in 
the following. The inner transform, Yobs, is determined 
by some external experiment, and is thus not flexible. 
Here we will assume that this operator is defined on a 
full-sky HEALPix grid of N^ide = 2048, typical for the 
CMB maps publishe d by the Planck experiment (Planck 
Collaboration||201^ . 

(Jt course, from the viewpoint of the overall linear sys¬ 
tem, the details of any individual operator is irrelevant, 
and the only crucial point is that the combined opera¬ 
tor remains the same. In order to speed up the calcula¬ 
tions through use of symmetries, we therefore substitute 
the inner-most HEALPix based noise covariance matrix 
product with a corresponding SymPix based product. 


= Y2N2-1Y^ (13) 

where Y 2 denotes an auxiliary SymPix grid; note that 
this does not need to be the same as Yi, but its resolution 
can be adjusted to trade nume rical precision for com ¬ 
putational speed. As shown by Seljebotn et al. (2014), 
Equation [T3| holds true if N 2 is constructed frorn 


02 = W2Y2Yjb.0, 


(14) 

in the same way as N is constructed from 0. In this 
latter expression, W 2 is a diagonal matrix containing 
the quadrature weights used in the spherical harmonic 
analysis of the target grid, while lacks the ring 

weights one normally uses in spherical harmonic analysis. 
Note that this operation is in fact the opposite procedure 
compared to naive resampling, which would be written 
Y 2 Y(() 3 gWobs in our no tation. For full details , we refer 


the interested reader to Seljebotn et al. (2014). 

The precision of Equation |l3| depends on tne relative 
bandlimit of Yi, Y 2 and Yobs- For instance, choosing 
f’max for Y 2 and Yobs to be twice that of Yi yields a nu¬ 
merical precision of 0(10“^°). Increasing these to four 
times that of Yi results in an accuracy of 0(10“^“^), 
whereas reducing it to only one, such that Yi = Y 2 , 
gives an accuracy of 0(10“^). Even the latter may be 
acceptable for preconditioning purposes. 

In order to derive an approxi mati on to the full coeffi¬ 
cient matrix defined by Equation |12[ we first re-write the 
system as 

D-|-B^N^iBx = r, (15) 


where 


D = YiDYf and B = Y 2 BYf. (16) 


We now introduce the approximation that Dij = 0 and 

Bij = 0 whenever two sample points i and j are not 
in the same or neighbouring tiles, as per the SymPix 
organization. The non-zero elements (i.e., the “local” 
part) of D and B are evaluated by Equation 


10 


at a cost 


of O(^max) operations per matrix element. However, as 


discussed in Section [4~H evaluating all required elements 
for a SymPix grid scales as 0{k‘^ ^/A’pix), as opposed to 
0(fc^ -Ypix) for less symmetric grids. 

These calculations constitute essential components of 
the pre-c omputation step of th e multi-grid solver pre¬ 
sented by Seljebotn et al. (2014). In that paper, all eval¬ 
uations were performed with the HEALPix grid, with 


Table 3 

CPU time for constructing preconditioner 



HEALPix 

SymPix 


^max 

(CPU min) 

(CPU min) 

Speed-Up 


Evalution of B^N 

3000 

727 

5.4 

130 

1500 

509 

1.4 

360 

750 

340 

0.37 

920 

375 

230 

0.11 

2100 

188 

452 

0.035 

13 000 

100 

363 

0.027 

13 000 

Sum 

2 621 

7.3 

360 

Evalution of D 

3000 

85 

3.3 

26 

1500 

15 

0.83 

18 

750 

2.4 

0.22 

11 

375 

0.36 

0.07 

5 

188 

0.05 

0.02 

3 

100 

0.01 

0.01 

1 

Sum 

103 

4.5 

23 


Note. — The top section lists the CPU time for precondi¬ 
tioner calculations that depend only on data geometry (mask, 
beam, noise characterization), while the bottom section lists 
the corresponding CPU time for calculations that depend on 
which in CMB applications typically corresponds to an an¬ 
gular power_s 2 ectrumj_C£. The second column is copied directly 
from|Seljebotn et al.| (|2014||, and shows results using HEALPix 
for all calculations, ine third row shows similar results using 
SymPix, while the fourth column shows the ratio between the 
two. 


a computational scaling of O(^maxfc^-Ypix) as discussed 
above. Their Table 2 summarizes the resulting compu¬ 
tational costs in units of CPU minutes. Here we repeat 
those calculations adopting exactly the same overall pa¬ 
rameters, facilitating a one-to-one comparison, but we 
employ SymPix for intermediate calculations instead of 
HEALPix. The results are summarized in Table |3l in 
which the s econd column is copied directly from |Selje-| 


botn et al. (2014), and the third column shows the new 
SymPix results. The fourth column shows the ratio be¬ 
tween the two. 

Clearly, the net gains achieved by the SymPix grid 
varies with resolution. For the high resolution levels the 
speed-up is driven by symmetries drastically reducing the 
time taken to evaluate B. The theoretical speed-up of 
732 times for evaluating B at £max = 3000, found in 
Table is reduced to 130 and 26 for B^N^^B and D, 
respectively. This is due to work that was previously 
unimportant now dominating the computation. At lower 
resolutions the speed-up is almost entirely due to being 
able to use the operator resampling given in Equation 
(13). This degradation procedure is not possible when 
using the HEALPix grid, and so our previous code had 
to use a resolution of Ngide = 2048 along columns and 
level resolution along rows. 

Overall, the SymPix grid reduces what used to be over¬ 
night jobs to essentially interactive tasks. 


5. CONCLUSION 

We have presented SymPix, a novel spherical grid 
for efficient sampling of rotationally invariant opera¬ 
tors. This grid derives many of its properties from the 
Gauss-Legendre grid, ensuring overall excellent spheri- 
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cal harmonics transform performance. The main dif¬ 
ference between the two grids is that SymPix sacrifices 
proper Nyquist sampling in the longitudinal direction in 
order to increase pixel symmetries, such that all grid 
pair distances repeat perfectly along constant-latitude 
rings. This decreases the computational scaling of eval¬ 
uating rotationally invariant operators from 0 {Npix) to 

The intended primary application of the SymPix grid 
is efficient construction of preconditioners (or smoothers) 
for iterative linear solvers. In this paper we considered 
the specific example of drawing constrained Gaussian re¬ 
alizations using a multi-grid solver, which is an important 
problem in current CMB analysis. Comparing with pre- 
vious state-of-the-art r esults based on the HEALPix grid 
(Seljebotn et al. 2014), we achieve average speed-ups of 


360 and 23 for the two most important pre-computation 
steps when using SymPix for internal calculations. 

However, we emphasize that SymPix is a special- 
purpose grid designed for precisely such tasks; it is not 
intended to provide a general purpose spherical pixeliza- 
tion that is suitable for, say, map making. HEALPix 
is clearly preferred for such purposes due to its uniform 
pixel areas, regular pixel window and hierarchical pixel 
structure. Likewise, if machine precision spherical har¬ 
monics transforms are required, the Gauss-Legendre grid 
is the obvious choice. However, for those particular ap¬ 
plications that can benefit from efficient pixel space sam¬ 
pling of linear operators, such as ours, SymPix holds a 
clear edge over existing alternatives. 


DSS and HKE are supported by European Research 
Gouncil grant StG2010-257080. 


APPENDIX 

CODE 

The SymPix code has been developed as part of the Gommmander project, and does not yet have its own library. 
For the benefit of the reader we have however copied the source files relevant to this paper to its own repository 
at http://github.com/dagss/sympix. Please consult the accompanying README file for further details. This 
repository will be updated if the code does eve ntually develop into a stand-a lone package. 


2013), at the time of writing available at 


The SHTs are all done using libsharp (|Reinecke & Seljebotn 
http://sourceforge.net/projects/libsharp/. We then construct the grid geometry in our Python code and feed 
it to libsharp. In the future we may port our Python code to G and make it available directly in libsharp. 
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